Body fluid volume estimation device, body fluid volume estimation method, and non-transitory computer-readable medium

ABSTRACT

A body fluid volume estimation device includes a pre-training unit, a transfer learning unit, and an estimation unit. The pre-training unit performs pre-training by using, as supervised information, information indicating body fluid volumes of the multiple patients when face images of multiple patients are captured. The transfer learning unit further performs transfer learning on multiple face images of one specific patient after the pre-training, and constructs a trained model. The estimation unit estimates, by inputting a face image of the one specific patient to the trained model, a body fluid volume at a point in time at which the face image of the one specific patient is captured. By estimating a body fluid volume from a face image by machine learning, the body fluid volume can be used for assistance such as decision making of a user.

INCORPORATION BY REFERENCE

This application is a Continuation of U.S. application Ser. No.18/216,806, filed on Jun. 30, 2023, which is based upon and claims thebenefit of priority from Japanese patent application No. 2022-109664,filed on Jul. 7, 2022, the disclosure of which is incorporated herein inits entirety by reference.

TECHNICAL FIELD

The present disclosure relates to a body fluid volume estimation device,a body fluid volume estimation method, and a non-transitorycomputer-readable medium.

BACKGROUND ART

In recent years, development of a technique for determining a healthstate of a person and presence or absence of a disease from anappearance of the person, for example, an image of a face and the likehas been progressing (Published Japanese Translation of PCTInternational Publication for Patent Application, No. 2022-512044,Japanese Unexamined Patent Application Publication No. 2020-199072, andJapanese Unexamined Patent Application Publication No. 2005-65812). Ingeneral, it is known that a change occurs in a form of a face and alower limb according to a health state of a person. An increase incapacity of a body is detected as a swelling, and a decrease in capacityis detected as a decrease in firmness of skin. A swelling mainly refersto a state where excessive water is accumulated in a gap of tissue and abody fluid volume increases, and occurs by various causes such as acentral disease, a respiratory/circulatory disease, a renal disease, anorthopedic disease, a metabolic disease, and a malignant tumor. Further,a decrease in firmness of skin occurs by a decrease in water in a body,and occurs in various states such as dehydration and heatstroke.

For example, a swelling occurs when a waste matter and water in a bodycannot be removed due to a decrease in renal function, and thusexcessive water in a body is removed by dialysis treatment in a presentcondition. Therefore, it is important for a dialysis patient to maintainwater in a body (i.e., a body fluid volume) within a desirable range,and thus an intake of water and salt needs to be limited in daily life.

As a change in weight, factors such as a change in body fluid volume,fat mass, and muscle mass are conceivable. Since it is conceivable thatdialysis is performed for approximately four hours, and a change in fatmass and muscle mass does not occur during the dialysis, a change inweight by the dialysis conceivably reflects a change in body fluidvolume. Further, for elderly patients having a chronic heart failure, itis also inconceivable that muscle mass is increased by exercise and anincrease in amount of food leads to an increase in fat, and thus achange in weight may conceivably represent a change in body fluidvolume. Furthermore, also in dehydration, fat mass and muscle mass donot change in a short period of time, and thus a change in weight mayconceivably represent a change in body fluid volume.

In order to recognize a state of a patient having a disease accompaniedby a swelling, a degree of the swelling of the patient, i.e., a bodyfluid volume is required to be measured. As a general swellingestimation technique, for example, a technique of measuring a degree ofa swelling by a pitting test by a medical staff member has been proposed(J. Chen et. al, “Camera-Based Peripheral Edema Measurement UsingMachine Learning,” in Proc. IEEE Int. Conf. Healthcare Informatics(ICHI), 2018, pp. 115-122.) In this technique, an image is capturedduring a pitting test of a lower limb, and a degree of a peripheralswelling of a lower limb or the like is estimated from the image bysupport vector machine (SVM) or a convolutional neural network (CNN).

Further, a technique (A. G. Smith et. al, “Objective determination ofperipheral edema in heart failure patients using short-wave infraredmolecular chemical imaging,” Journal of Biomedical Optics, vol. 26, No.10, pp. 105002, 2021) of measuring a degree of a swelling, based on animage in which a peripheral portion such as hands and feet of a patientis captured, by using a short-wave infrared (SWIR) camera has beenproposed. In this technique, by using a property in which an absorptioncoefficient of water, collagen, and fat increases in a specific spectrumregion of the SWIR camera in presence of a swelling, a swelling levelcan be estimated from a spectrum component.

SUMMARY

As described above, in order to manage a degree of a swelling of apatient, i.e., a body fluid volume on a daily basis, it is desirablethat the patient himself/herself measures a body fluid volume, andmanages an intake of water according to a measurement result. However,the technique for measuring a body fluid volume described above needs tobe performed by a professional medical staff member, and there is also arestriction that specific equipment such as a special camera is needed.

Meanwhile, in order for a patient to autonomously limit an intake ofwater and salt in daily life, a technique for a patient to be able tomeasure a body fluid volume on a daily basis is required to beestablished.

The present disclosure has been made in view of the circumstancesdescribed above, and an example object of the invention is to estimate abody fluid volume of a patient from a face image of the patient.

In a first example aspect of the present disclosure, a body fluid volumeestimation device includes: a pre-training unit configured to performpre-training on face images of multiple patients by using, as supervisedinformation, information indicating a body fluid volume of each of theplurality of patients when the face images of the plurality of patientsare captured; a transfer learning unit configured to further performtransfer learning on multiple face images of one specific patient afterthe pre-training, and construct a trained model; and an estimation unitconfigured to estimate, by inputting a face image of the one specificpatient to the trained model, a body fluid volume of the one specificpatient at a point in time at which the face image of the one specificpatient is captured.

In a second example aspect of the present disclosure, a body fluidvolume estimation method includes: performing pre-training on faceimages of multiple patients by using, as supervised information,information indicating a body fluid volume of each of the plurality ofpatients when the face images of the plurality of patients are captured;further performing transfer learning on multiple face images of onespecific patient after the pre-training, and constructing a trainedmodel; and estimating, by inputting a face image of the one specificpatient to the trained model, a body fluid volume of the one specificpatient at a point in time at which the face image of the one specificpatient is captured.

In a third example aspect of the present disclosure, a non-transitorycomputer-readable medium storing a program causing a computer toexecute: processing of performing pre-training on face images ofmultiple patients by using, as supervised information, informationindicating a body fluid volume of each of the plurality of patients whenthe face images of the plurality of patients are captured; processing offurther performing transfer learning on multiple face images of onespecific patient after the pre-training, and constructing a trainedmodel; and processing of estimating, by inputting a face image of theone specific patient to the trained model, a body fluid volume of theone specific patient at a point in time at which the face image of theone specific patient is captured. The present disclosure is able toestimate a body fluid volume of a patient from a face image of thepatient.

BRIEF DESCRIPTION OF DRAWINGS

The above and other aspects, features and advantages of the presentdisclosure will become more apparent from the following description ofcertain exemplary embodiments when taken in conjunction with theaccompanying drawings, in which:

FIG. 1 is a diagram schematically illustrating a configuration of a bodyfluid volume estimation device according to a first example embodiment;

FIG. 2 is a diagram illustrating a modification example of the bodyfluid volume estimation device according to the first exampleembodiment;

FIG. 3 is a diagram illustrating a modification example of the bodyfluid volume estimation device according to the first exampleembodiment;

FIG. 4 is a diagram illustrating a sequence of processing of the bodyfluid volume estimation device according to the first exampleembodiment;

FIG. 5 is a diagram illustrating an overview of pre-training byweight-aware supervised momentum contrast (WeightSupMoCo) according tothe first example embodiment;

FIG. 6 is a diagram illustrating an overview of face images before andafter dialysis in an embedding space after the pre-training;

FIG. 7 is a diagram illustrating an overview of transfer learningaccording to the first example embodiment;

FIG. 8 is a diagram illustrating estimation performance ofclassification before and after dialysis and weight prediction in eachtechnique;

FIG. 9 is a diagram illustrating a transition of estimated weight andground-truth weight of a patient having the highest estimationperformance;

FIG. 10 is a diagram illustrating a transition of estimated weight andground-truth weight of a patient having the lowest estimationperformance;

FIG. 11 is a diagram illustrating a change in accuracy of classificationbefore and after dialysis with respect to the number of days of transferlearning data;

FIG. 12 is a diagram illustrating a change in mean absolute error (MAE)of weight estimation with respect to the number of days of the transferlearning data; and

FIG. 13 is a diagram illustrating a configuration example of a computer.

EXAMPLE EMBODIMENT

Example embodiments of the present disclosure will be described belowwith reference to the drawings. In each of the drawings, the sameelements will be denoted by the same reference signs, and duplicatedescription will be omitted as necessary.

First Example Embodiment

A body fluid volume estimation device 100 according to a first exampleembodiment will be described. The body fluid volume estimation device100 is configured to perform contrastive learning by weight-awaresupervised momentum contrast (WeightSupMoCo) being a new technique on aface image of a patient, based on information representing a body fluidvolume of the patient. Then, the body fluid volume estimation device 100can estimate a body fluid volume of a patient being an estimationtarget, for example, a change in the body fluid volume by inputting aface image of the patient to a trained model.

FIG. 1 schematically illustrates a configuration of the body fluidvolume estimation device 100 according to the first example embodiment.The body fluid volume estimation device 100 includes a pre-training unit1, a transfer learning unit 2, and an estimation unit 3.

The pre-training unit 1 receives pre-training data Dp described below,and performs pre-training that performs machine learning on a face imageby using a body fluid volume label indicating a body fluid volume of apatient as a supervised label.

The transfer learning unit 2 reads, after completion of thepre-training, transfer learning data Dt described below that are formedof information about one specific patient being an estimation target ofa body fluid volume, and performs transfer learning by using the faceimage. In this way, the transfer learning unit 2 can construct, based onthe face image of the one specific patient being input separately, atrained model M that is used for estimating a body fluid volume.

The pre-training data Dp and the transfer learning data Dt may beprovided from the outside of the body fluid volume estimation device 100to the body fluid volume estimation device 100 via various communicationmeans such as a network.

Further, the pre-training data Dp and the transfer learning data Dt maybe stored in advance in a storage unit provided in the body fluid volumeestimation device 100. FIG. 2 schematically illustrates a modificationexample of the body fluid volume estimation device 100 according to thefirst example embodiment. As illustrated in FIG. 2 , the body fluidvolume estimation device 100 may further include a storage unit 4. Thepre-training data Dp and the transfer learning data Dt are stored inadvance in the storage unit 4. The pre-training unit 1 can appropriatelyread the pre-training data Dp from the storage unit 4. The transferlearning unit 2 can appropriately read the transfer learning data Dtfrom the storage unit 4. The information stored in the storage unit 4may be provided from the outside of the storage unit 4 to the storageunit 4 via various communication means such as a network.

The estimation unit 3 estimates, by inputting a face image IN of onespecific patient being captured at any point in time to the constructedtrained model M, a body fluid volume of the patient at a point in timeat which the face image IN is captured.

The face image IN may be provided from the outside of the body fluidvolume estimation device 100 to the body fluid volume estimation device100 via various communication means such as a network. In this case, thebody fluid volume estimation device 100 is installed in a medicalinstitution, for example, and thus a body fluid volume of a targetpatient can be estimated in the medical institution by transmitting aface image captured by the patient to the medical institution.

Further, the face image IN may be an image captured by an imagecapturing unit provided in the body fluid volume estimation device 100.FIG. 3 schematically illustrates a modification example of the bodyfluid volume estimation device 100 according to the first exampleembodiment. As illustrated in FIG. 3 , the body fluid volume estimationdevice 100 may further include an image capturing unit 5. The imagecapturing unit 5 may be various image capturing devices that can acquirean image, and can appropriately capture a face image of one specificpatient at any point in time. The captured face image IN isappropriately input to the estimation unit 3. Note that the capturedface image IN may be input to the estimation unit 3 via anothernon-illustrated processing unit, or may be read by the estimation unit 3after the face image IN is stored in the storage unit 4.

As illustrated in FIG. 3 , the pre-training unit 1, the transferlearning unit 2, the estimation unit 3, the storage unit 4, and theimage capturing unit 5 are provided in one body fluid volume estimationdevice 100, and thus the body fluid volume estimation device 100 can bemounted on various portable terminals such as a smartphone on which animage capturing device such as a camera is mounted, for example.

Hereinafter, in the present example embodiment, it is assumed that abody fluid volume of a patient is estimated by estimating whether aswelling occurs in a face of the patient, and by predicting weight ofthe patient at a capturing point in time of a face image. Further, inthe present example embodiment, it is assumed that a dialysis patient isa target, and whether a face image is captured before dialysis in whicha swelling occurs or after dialysis in which a swelling does not occuris estimated in order to clearly distinguish whether a swelling occursin a face of the patient.

In this way, when a state in which a swelling does not occur, i.e., astate after dialysis is set as a normal state, whether a body fluidvolume of a patient is significantly changed from the normal state canbe detected by estimating whether a swelling occurs in an input faceimage. When a swelling is estimated to occur in an input face image of apatient being an estimation target, for example, the estimation unit 3may output, as a detection result, an increase in body fluid volume ascompared to the normal state. Further, the estimation unit 3 mayestimate a change amount of a body fluid volume from estimated weight,and output an estimation result. Such estimation result and detectionresult may be displayed on, for example, a non-illustrated displaydevice (for example, an output unit 1007 in FIG. 13 ), and may beappropriately displayed on a screen when the body fluid volumeestimation device 100 is mounted on a smartphone.

Next, the pre-training data Dp will be described. The pre-training dataDp are configured to include a plurality of records including a faceimage of a patient, label information (may also be simply referred to asa dialysis label) representing that the patient when the face image iscaptured before or after dialysis, and label information (may also besimply referred to as a weight label) indicating weight when the faceimage is captured. In other words, one record includes a face image ofone patient, a dialysis label, and a weight label.

For example, in a case of a dialysis patient, a value acquired bysubtracting a body fluid volume removed from a body of the patient bythe dialysis from a value of a weight label associated with a face imageof the patient before the dialysis may be used as a weight labelassociated with a face image after the dialysis. In this case, ascompared to a case where weight of the patient after the dialysis ismeasured, a change in body fluid volume can be more accurately reflectedin a weight label associated with a face image of the patient after thedialysis.

Further, for example, a value acquired by adding a body fluid volumeremoved from a body of a patient by dialysis to a weight labelassociated with a face image of the patient after the dialysis may beused as a weight label associated with a face image before the dialysis.In this case, as compared to a case where weight of the patient afterthe dialysis is measured, a change in body fluid volume can be moreaccurately reflected in a weight label associated with a face image ofthe patient before the dialysis. Further, a standard weight of a patientafter dialysis, i.e., a patient in which a swelling does not occur, maybe predetermined in advance for each patient, and a value acquired byadding a body fluid volume removed from a body of the patient by thedialysis to the standard weight may be used as a weight label associatedwith a face image before the dialysis of the patient.

Next, the transfer learning data Dt will be described. The transferlearning data Dt are configured to include a plurality of recordsincluding a face image of one specific patient being an estimationtarget of classification before and after dialysis and a predictiontarget of weight in the body fluid volume estimation device 100,dialysis label information, and a weight label. A configuration of eachof the records is similar to that of the pre-training data Dp.

In the present example embodiment, it is assumed that the number of therecords included in the pre-training data Dp is greater than the numberof the records included in the transfer learning data Dt.

A dialysis label is a discrete label indicating a patient when a faceimage is captured is before or after dialysis, and may be, for example,“1” when an associated face image is a face image captured beforedialysis with a swelling, and “0” when an associated face image is aface image captured after dialysis without a swelling.

In contrast, a weight label of a dialysis patient is provided as anumerical value, i.e., a continuous label indicating weight of thedialysis patient when an associated face image is captured.

A face image included in each record may be acquired by, for example,the pre-training unit 1 reading an original image including a face of adialysis patient being captured in advance, and performing appropriateimage processing. For example, the pre-training unit 1 performs facedetection on an original image, and then estimates a central portion ofa face. Then, the pre-training unit 1 may perform data augmentation suchas resizing of an image, horizontal flipping, color conversion of animage, and gray scaling on the extracted face image as necessary, andmay thus set the extracted face image as a face image of each record.Such data augmentation may be performed by the pre-training unit 1, ormay be performed by an image processing unit provided separately fromthe pre-training unit 1.

Next, a flow and a processing content of contrastive learning processingof the body fluid volume estimation device 100 will be described. Asdescribed above, the body fluid volume estimation device 100 performsthe pre-training by WeightSupMoCo by using the pre-training data Dpabout an unspecified large number of patients as input data, and thetransfer learning by using the transfer learning data Dt about onespecific patient being an estimation target as input data. FIG. 4illustrates a sequence of processing of the body fluid volume estimationdevice 100 according to the first example embodiment.

Step S1: Pre-Training

In the pre-training, the body fluid volume estimation device 100performs the pre-training by WeightSupMoCo by using the pre-trainingdata Dp about an unspecified large number of patients as input data.

In the pre-training, contrastive learning based on Momentum Contrast(MoCo, Kaiming He et al., “Momentum Contrast for Unsupervised VisualRepresentation Learning”, in Proc. IEEE/CVF conf. computer vision andpattern recognition (CVPR), 2020, pp. 9729-9738) is performed. However,self-supervised learning without using a supervised label is performedin the contrastive learning by original MoCo, whereas, in the presentexample embodiment, as described above, the contrastive learning byWeightSupMoCo is performed by using a dialysis label and a weight labelas supervised labels.

FIG. 5 illustrates an overview of the pre-training by WeightSupMoCoaccording to the first example embodiment. In WeightSupMoCo, an encoder(feature extractor) and a momentum encoder are used, and a fullyconnected layer referred to as a projection head is provided in asubsequent stage of each of the encoders. A feature value being outputfrom the projection head in the subsequent stage of the encoder isreferred to as a query, and a feature value being output from theprojection head in the subsequent stage of the momentum encoder isreferred to as a key. The key is accumulated as a queue in a dictionary,and the contrastive learning is performed by using the query and thequeue.

Meanwhile, the contrastive learning by Simple Framework for ContrastiveLearning of Visual Representations (SimCLR, T. Chen, S. Kornblith, M.Norouzi, and G. Hinton, “A simple framework for contrastive learning ofvisual representations,” Proc. Int. conf. machine learning (ICML), 2020,pp. 1597-1607) using a normal encoder instead of the momentum encoder,and the like has also been known. Such a technique has a size of adictionary equivalent to a mini-batch size, and has thus been known tohave a disadvantage of a small number of samples with which thecontrastive learning is performed.

Further, as a technique for the contrastive learning using a discretesupervised label such as a dialysis label, for example, a techniqueusing Supervised Contrastive (SupCon) Loss (P. Khosla et al.,“Supervised Contrastive Learning”, in Proc. Advances in NeuralInformation Processing Systems (NeurIPS), vol. 33, pp. 18661-18673,2020) as a loss function has been known. As a technique for thecontrastive learning using a continuous supervised label such as aweight label, for example, a technique using y-Aware InfoNCE Loss (B.Dufumier et al., “Contrastive Learning with Continuous Proxy Meta-Datafor 3D MRI Classification”, in Int. Conf. Medical Image Computing andComputer-Assisted Intervention (MICCAI), 2021, pp. 58-68) as a lossfunction has been known. However, the techniques using SupCon Loss andy-Aware InfoNCE Loss are SimCLR-type techniques, and have a disadvantageof a small number of samples with which the contrastive learning isperformed, as described above.

In contrast, in the present example embodiment, an MoCo-typeWeightSupMoCo Loss is used as a loss function when the contrastivelearning is performed by using a discrete dialysis label and acontinuous weight label as supervised labels. Thus, unlike theSimCLR-type technique, the contrastive learning can be performed byincreasing a size of a dictionary, and acquisition of better featurerepresentation can be achieved.

Next, WeightSupMoCo according to the present example embodiment will bedescribed in more detail. In WeightSupMoCo, when labels before and afterdialysis are the same, contrastive learning is performed in such a waythat feature representation of face images is brought closer accordingto a degree of similarity to a weight label. Hereinafter, a specificdescription will be given. The reason is that a degree of a swelling maybe conceivably more similar when states of presence and absence of aswelling are the same between face images before dialysis or betweenface images after dialysis, and a degree of a swelling may be moresimilar when weights are similar.

Hereinafter, a specific description will be given. When a weight labelis y, a dialysis label is a, and a feature value being output from theprojection head is z, a WeightSupMoCo Loss is represented by thefollowing expression.

L i = ∑ k ∈ A ⁡ ( i ) w σ ( y i ⁢ p ⁢ y k ) · δ a i = a k ∑ j ∈ A ⁡ ( i ) wσ ( y i , y i ) · δ a i = a j . exp ⁡ ( z i · z k / τ ) ∑ j ∈ A ⁡ ( i )exp ⁡ ( z i · z k / τ ) [ Mathematical ⁢ 1 ]

Note that A(i) is a group of a mini-batch of queries and queues exceptfor an i-th record. w_(σ)(⋅,⋅) is a function representing a degree ofsimilarity of a weight label, and a radial basis function (RBF) kerneldescribed in “B. Dufumier et al., “Contrastive Learning with ContinuousProxy Meta-Data for 3D MRI Classification”, in Int. Conf. Medical ImageComputing and Computer-Assisted Intervention (MICCAI), 2021, pp. 58-68.”is used herein. σ is a hyper parameter of the RBF kernel, and σ=3.0. τis a temperature parameter, and τ=0.1 herein.

δa _(i) =a _(j)  [Mathematical 2]

is a function that replies “1” when a label a_(i) and a label a_(j) havethe same value (a_(i)=a_(j)), and replies “0” in the other cases.Further, a dictionary size of a queue is assumed to be 1024. Since thefeature value z is a Euclidean distance 1, an inner product of featurevalues is equivalent to a cosine similarity degree.

FIG. 6 illustrates an overview of face images before and after dialysisin an embedding space after the pre-training. As illustrated in FIG. 6 ,as a result of the pre-training, face images having the same dialysislabel “before dialysis” in the embedding space are distributed at ashort distance in the embedding space. In contrast, face images havingthe same dialysis label “after dialysis” are distributed in relativelydistant positions from the face images having the same dialysis label“before dialysis”.

Further, in the embedding space, face images having the same dialysislabel are distributed according to a degree of similarity of a weightlabel, i.e., are distributed at a shorter distance with more similarweight labels and are distributed at a farther distance with lesssimilar weight labels.

Step S2: Transfer Learning

Subsequently, the transfer learning unit 2 performs the transferlearning by using a result PL of the pre-training by using, as inputdata, the transfer learning data Dt about one specific patient being anestimation target. In the present example embodiment, in the transferlearning, only the encoder is used without using the projection headused in the pre-training. FIG. 7 illustrates an overview of the transferlearning according to the first example embodiment. In this way, onelinear layer formed of a classification layer and a regression layer isadded to a subsequent stage of the encoder. In the classification layer,the number of dimensions of output in classification before and afterdialysis using the transfer learning data Dt being data about onespecific patient is 2. In the regression layer, the number of dimensionsof output in weight prediction is 1.

Then, fine tuning that performs transfer learning on the encoder and allof the added linear layers is performed by using the data about the onespecific patient. Herein, as a loss function in the fine tuning, a crossentropy loss is used for classification before and after dialysis, and amean square error loss is used for weight prediction.

Step S3: Estimation

The trained model M is constructed by performing the proceduresdescribed above. By inputting the face image IN of the one specificpatient being the estimation target to the trained model M, theestimation unit 3 can estimate whether the patient extracted from theface image is before or after dialysis (presence or absence of aswelling), and predict weight of the patient, and can output theestimation result OUT.

By the procedures described above, the trained model can be constructedby performing the pre-training using data about an unspecified largenumber of patients and the transfer learning using data about onespecific patient. Then, by inputting a face image of the one specificpatient to the trained model, whether the patient is before or afterdialysis (presence or absence of a swelling), and weight can bepredicted.

Next, a comparison experiment with a general technique was performed inorder to verify significance of a technique in the present exampleembodiment. Hereinafter, an experiment condition and an experimentresult will be described.

In the pre-training, randomly selected 80% of data acquired frommultiple patients was used as the pre-training data Dp, and remaining20% was used as validation data.

In the transfer learning and estimation, multiple groups of face imagesbefore and after dialysis of one specific patient were acquired onseveral different dialysis opportunities. One of the groups was used asthe test data. Among the groups other than the test data, 80% of therandomly selected groups were used as the transfer learning data Dt andthe remaining 20% was used as validation data. Note that an opportunityfor a patient to receive dialysis is referred to as a dialysisopportunity, but dialysis is normally performed within one day(generally, approximately four hours), and thus the dialysis opportunitywill be simply referred to as a dialysis day below. Further, in thepresent experiment, it is assumed that the one specific patient used forthe transfer learning and the estimation is not included among theplurality of patients used for the pre-training.

The face images used in this experiment were resized to 224×224 pixels,and data augmentation was performed including flipped horizontally,color conversion, and grayscaling. In the present experiment, trainingof 100 epochs in the pre-training and 20 epochs in the transfer learningwas performed. However, a model in which a validation data error in thepre-training is the smallest epoch was used for the transfer learning ina subsequent stage. As an optimization algorithm, Adam (D. P. Kingma andJ. Ba, “ADAM: A METHOD FOR STOCHASTIC OPTIMIZATION,” arXiv preprintarXiv: 1412.6980, 2014) was used, a learning rate of the pre-trainingwas set as 10⁻⁴, and a learning rate of the transfer learning was set as10⁻³. In the prediction of weight in the transfer learning, the learningand the prediction were performed with weight being normalized into amean 0 and a variance 1.

In the present experiment, the following techniques were used ascomparative examples for comparison with the technique according to thepresent example embodiment.

First Comparative Example

In order to verify effectiveness of the pre-training, a technique(without the pre-training) for performing training only with data aboutone specific patient without performing the pre-training was set as afirst comparative example.

Second Comparative Example

In order to verify effectiveness when a supervised label was used,SimCLR in a self-supervised learning technique was set as a secondcomparative example.

Third Comparative Example

In order to verify effectiveness when a supervised label was used, MoCoin a self-supervised learning technique was set as a third comparativeexample.

Fourth Comparative Example

Similarly, as a pre-training technique using general normal supervisedlearning, a classification technique using a dialysis label based on across entropy loss was set as a fourth comparative example.

Fifth Comparative Example

In order to verify effectiveness of the pre-training based on thetechnique (WeightSupMoCo) according to the present example embodiment, acase where the pre-training technique based on SupCon using a dialysislabel as discrete supervised information was used was set as a fifthcomparative example.

Sixth Comparative Example

Similarly, in order to verify effectiveness of the pre-training based onthe technique (WeightSupMoCo) according to the present exampleembodiment, a case where the pre-training technique based on y-AwareInfoNCE using a weight label as continuous supervised information wasused was set as a sixth comparative example.

As an evaluation indicator in classification before and after dialysis,Accuracy, area under the receiver operating characteristic curve(ROC-AUC), and area under the precision-recall curve (PR-AUC) of anestimated label were used. Further, as an evaluation indicator in weightprediction, a mean absolute error (MAE), a root mean squared error(RMSE), and a correlation coefficient (CorrCoef) between a predictedvalue of weight and ground-truth data were used.

Experiment Result

FIG. 8 illustrates estimation performance of classification before andafter dialysis and weight prediction in each technique. It could beconfirmed from FIG. 8 that WeightSupMoCo according to the presentexample embodiment can estimate classification before and after dialysisand predict weight with higher performance than that of any of thecomparative techniques.

First, it could be understood from comparison between the technique(“without pre-training” in FIG. 8 ) without performing the pre-trainingand the other techniques that effectiveness for performing thepre-training was significant. Further, effectiveness for using adialysis label and weight of a dialysis patient as supervised data abouta body fluid volume could be confirmed from comparison between SimCLRand MoCo being the self-supervised learning techniques and WeightSupMoCoaccording to the present example embodiment. Furthermore, effectivenessfor using a dialysis label and a weight label in a cooperative mannerand effectiveness for increasing a size of a dictionary by a MoCo typecould also be confirmed from comparison between WeightSupMoCo, andSupCon and y-Aware InfoNCE.

Next, a transition of estimated weight and ground-truth weight for adialysis opportunity will be considered. FIG. 9 illustrates a transitionof estimated weight and ground-truth weight of a patient having thehighest estimation performance. It could be confirmed from FIG. 9 that atransition of weight could be predicted with high performance. Further,FIG. 10 illustrates a transition of estimated weight and ground-truthweight of a patient having the lowest estimation performance. It couldbe confirmed that some patients had a dialysis opportunity with lowperformance of estimated weight but an increase and a decrease in weightbefore and after dialysis could be suitably predicted. Thus, it could beunderstood that weight representing a degree of a swelling could bepredicted from a face image of a dialysis patient.

Next, estimation performance of classification before and after dialysisand weight prediction with respect to the number of days of trainingdata used in the transfer learning will be considered. FIG. 11illustrates a change in Accuracy being the estimation performance of theclassification before and after dialysis with respect to the number ofdays of the transfer learning data. FIG. 12 illustrates a MAE changebeing the estimation performance of the weight estimation with respectto the number of days of the transfer learning data. In FIGS. 11 and 12, a technique without performing the pre-training (without thepre-training), a technique for performing the pre-training based onWeightSupMoCo, and a technique (Image Net->WeightSupMoCo) for performingthe pre-training on the model pre-trained with ImageNet (J. Deng, W.Dong, R. Socher, L. Li, et al., “ImageNet: A Large-Scale HierarchicalImage Database”, Proc. IEEE/CVF conf. computer vision and patternrecognition (CVPR), 2009), based on WeightSupMoCo, were compared.

In all of the techniques, the estimation performance was improved as thenumber of days of the transfer learning data was increased. Further,when the technique without the pre-training was compared with the othertechniques, it could be confirmed that effectiveness for performing thepre-training was significant.

Further, it could be confirmed that ImageNet->WeightSupMoCo was the mostaccurate and it was effective to perform the pre-training on the modelpre-trained with ImageNet, based on WeightSupMoCo.

Since it is desirable that the number of days of training data about onespecific patient constituting the transfer learning data is as small aspossible, it can be understood that estimation with relatively highperformance in which Accuracy of the classification before and afterdialysis is 76.8% and MAE in the weight prediction is 0.65 kg can beachieved by using the transfer learning data in a relatively shortperiod of three days by performing the pre-training(ImageNet->WeightSupMoCo).

As described above, the body fluid volume estimation device 100 can notonly determine presence or absence of a swelling from a face image of apatient, but can also predict weight of the dialysis patient andestimate a body fluid volume in a body, i.e., a degree of a swelling ofthe dialysis patient. Further, since training is performed by using adialysis label and a weight label, and estimation is performed by usingthe training result, a variation in swelling level and the like by aviewpoint of a doctor can be reduced unlike a previous diagnosis, and amore objective diagnostic result can be acquired.

Further, the number of pieces of data such as a face image that can beacquired, presence or absence of a swelling, and weight is limited fromonly one specific patient, but, according to the present configuration,the pre-training is performed by data acquired from an unspecified largenumber of patients, and thus a sufficient amount of training data can beused in order to achieve high estimation performance.

As illustrated in FIG. 3 , by incorporating an image capturing means ofa face image into the body fluid volume estimation device 100, a patientcaptures his/her own face image at any place (for example, at home) andany point in time, and performs estimation by the body fluid volumeestimation device 100, and can thus recognize presence or absence ofoccurrence of his/her own swelling and a degree of the swelling, i.e., abody fluid volume. In this way, a dialysis patient can appropriately andeasily confirm his/her own body fluid volume at appropriate time.Further, the present configuration is advantageous over a generalinspection technique in a point that involvement of health careprofessionals and special mechanical equipment are unnecessary.

According to the present configuration, whether a state of a patient iscloser to a state in which a swelling occurs or a state in which aswelling does not occur can be recognized by using a prediction scorerepresenting presence or absence of a swelling being acquired from anestimation result of presence or absence of a swelling. Further, byusing estimated weight, a patient himself/herself can recognize a degreeof a swelling, and estimate the amount of water in a body. Furthermore,a patient himself/herself can perform an adjustment of the amount offood and a water intake, selection of a menu of a meal, and anadjustment of the amount of medicine (such as a diuretic) and the like,based on a score representing presence or absence of an estimatedswelling, weight, and the like.

Even when a patient cannot go to a medical institution such as ahospital (for example, in a remote area), by transmitting informationindicating a body fluid volume estimated by the body fluid volumeestimation device 100 to a doctor and the like in the medicalinstitution, the doctor and the like can recognize the body fluid volumeof the patient with high performance, and can also give an accuratediagnosis and life guidance.

Other Example Embodiment

Note that the present disclosure is not limited to the exampleembodiment described above, and may be appropriately modified withoutdeparting from the scope of the present disclosure. For example, in theexample embodiment described above, a dialysis label and a weight labelare used as supervised labels, but another discrete label and anothercontinuous label may be appropriately used in combination. Further,description is given on an assumption that one kind of a discrete labeland one kind of a continuous label are used, but any number of each of akind of a discrete label and a kind of a continuous label can be used.

In the example embodiment described above, a dialysis patient isfocused, but it is needless to say that the body fluid volume estimationdevice 100 may be applied to estimation of an occurrence situation of aswelling of a patient having another disease in which a swelling occurs.

In the example embodiment described above, description is given on anassumption that presence or absence of occurrence of a swelling isestimated, which is merely an exemplification. A situation of a changein face image other than a swelling due to a disease and the like, forexample, a change in size and shape of a face other than a swelling dueto heatstroke and the like, a change in complexion, and the like canalso be estimated.

Although the configuration of the body fluid volume estimation device100 has been described above as a configuration of hardware in theexample embodiment described above, the present disclosure is notlimited to the example embodiment. The processing in the body fluidvolume estimation device 100 can also be achieved by causing a centralprocessing unit (CPU) to execute a computer program. Further, theprogram described above is stored by using a non-transitorycomputer-readable medium of various types, and can be supplied to acomputer. The non-transitory computer-readable medium includes atangible storage medium of various types. Examples of the non-transitorycomputer-readable medium include a magnetic recording medium (forexample, a flexible disk, a magnetic tape, and a hard disk drive), amagneto-optical recording medium (for example, a magneto-optical disk),a CD-read only memory (CD-ROM), a CD-R, a CD-R/W, and a semiconductormemory (for example, a mask ROM, a programmable ROM (PROM), an erasablePROM (EPROM), a flash ROM, and a random access memory (RAM)). Further, aprogram may be supplied to a computer by a transitory computer-readablemedium of various types. Examples of the transitory computer-readablemedium include an electrical signal, an optical signal, and anelectromagnetic wave. The transitory computer-readable medium may supplythe program to the computer via a wired communication path such as anelectric wire and an optical fiber or a wireless communication path.

One example of the computer is described below. As the computer, variouscomputers such as a dedicated computer and a personal computer (PC) canbe achieved. However, the computer does not need to be physicallysingle, and may be plural when distributed processing is performed.

FIG. 13 illustrates a configuration example of a computer. A computer1000 in FIG. 13 includes a central processing unit (CPU) 1001, a readonly memory (ROM) 1002, and a random access memory (RAM) 1003, and theseare connected to one another via a bus 1004. Note that description of OSsoftware and the like for operating the computer will be omitted, butthe computer includes the OS software and the like as a matter ofcourse.

An input/output interface 1005 is also connected to the bus 1004. Forexample, an input unit 1006 formed of a keyboard, a mouse, a sensor, andthe like, an output unit 1007 formed of a display formed of a CRT, anLCD, and the like, a headphone, a speaker, and the like, a storage unit1008 formed of a hard disk and the like, a communication unit 1009formed of a modem, a terminal adapter, and the like, and the like areconnected to the input/output interface 1005.

The CPU 1001 executes various programs stored in the ROM 1002, varioustypes of processing according to various programs loaded from thestorage unit 1008 into the RAM 1003, and, for example, processing ofeach unit of the body fluid volume estimation device 100 in the exampleembodiment described above. Note that a graphics processing unit (GPU)may be provided, and similarly to the CPU 1001, the GPU may executevarious programs stored in the ROM 1002, various types of processingaccording to various programs loaded from the storage unit 1008 into theRAM 1003, and, for example, processing of each unit of the body fluidvolume estimation device 100 in the present example embodiment describedabove. Note that the GPU is suitable for use for performing pieces oftypical processing in parallel, and, by applying the GPU to processingand the like in a neural network described below, a processing speed canbe improved as compared to the CPU 1001. Data and the like needed forthe CPU 1001 and the GPU to perform various types of processing are alsoappropriately stored in the RAM 1003.

For example, the communication unit 1009 performs communicationprocessing via the Internet (not illustrated), transmits data providedfrom the CPU 1001, and outputs data received from a communicationpartner to the CPU 1001, the RAM 1003, and the storage unit 1008. Thestorage unit 1008 communicates with the CPU 1001, and stores and deletesinformation. The communication unit 1009 performs communicationprocessing of an analog signal or a digital signal with another device.

A drive 1010 is connected to the input/output interface 1005 asnecessary, and, for example, a magnetic disk 1011, an optical disk 1012,a flexible disk 1013, a semiconductor memory 1014, or the like isappropriately mounted, and a computer program read from that isinstalled in the storage unit 1008 as necessary.

The present disclosure has been described above, and the presentdisclosure can also be described as follows.

(Supplementary Note 1)

A body fluid volume estimation device including:

-   -   a pre-training unit configured to perform pre-training on face        images of multiple patients by using, as supervised information,        information indicating a body fluid volume of each of the        plurality of patients when the face images of the plurality of        patients are captured;    -   a transfer learning unit configured to further perform transfer        learning on multiple face images of one specific patient after        the pre-training, and construct a trained model; and    -   an estimation unit configured to estimate, by inputting a face        image of the one specific patient to the trained model, a body        fluid volume of the one specific patient at a point in time at        which the face image of the one specific patient is captured.

(Supplementary Note 2)

The body fluid volume estimation device according to Supplementary Note1, in which

-   -   the information indicating the body fluid volume of each the        plurality of patients when the face images of the plurality of        patients are captured includes information indicating presence        or absence of a swelling in the face image of each of the        plurality of patients, and information indicating weight of each        of the plurality of patients, and    -   the estimation unit estimates presence or absence of a swelling        and predicts weight of the one specific patient at a point in        time at which the face image of the one specific patient is        captured by inputting the face image of the one specific patient        to the trained model.

(Supplementary Note 3)

The body fluid volume estimation device according to Supplementary Note2, in which the estimation unit

-   -   detects, based on an estimation result of the presence or        absence of the swelling of the one specific patient, whether the        body fluid volume of the one specific patient is changed from a        preset standard body fluid volume of the one specific patient,        and    -   acquires, from an prediction result of the weight, a difference        in the body fluid volume of the one specific patient from the        standard body fluid volume.

(Supplementary Note 4)

The body fluid volume estimation device according to Supplementary Note2 or 3, in which

-   -   the information indicating the presence or absence of the        swelling in the face image of each of the plurality of patients        is label information representing the presence or absence of the        swelling, and    -   the pre-training unit performs pre-training in such a way that        feature values of face images having the same label information        representing the presence or absence of the swelling are brought        closer as the information indicating the weight is more similar.

(Supplementary Note 5)

The body fluid volume estimation device according to any one ofSupplementary Notes 2 to 4, in which the pre-training unit performspre-training by weight-aware supervised momentum contrast(WeightSupMoCo).

(Supplementary Note 6)

The body fluid volume estimation device according to any one ofSupplementary Notes 2 to 5, in which

-   -   the plurality of patients and the one specific patient are a        patient who receives dialysis, and    -   weight of each of the plurality of patients and the one specific        patient after dialysis associated with a case without a swelling        has a value acquired by subtracting a body fluid volume removed        by dialysis from weight of each of the plurality of patients and        the one specific patient before dialysis associated with a case        with a swelling.

(Supplementary Note 7)

The body fluid volume estimation device according to any one ofSupplementary Notes 2 to 5, in which

-   -   the plurality of patients and the one specific patient are a        patient who receives dialysis, and    -   weight of each of the plurality of patients and the one specific        patient before dialysis associated with a case with a swelling        has a value acquired by adding a body fluid volume removed by        dialysis to weight of each of the plurality of patients and the        one specific patient after dialysis associated with a case        without a swelling.

(Supplementary Note 8)

The body fluid volume estimation device according to any one ofSupplementary Notes 2 to 5, in which

-   -   the plurality of patients and the one specific patient are a        patient who receives dialysis, and    -   weight of each of the plurality of patients and the one specific        patient before dialysis associated with a case with a swelling        has a value acquired by adding a body fluid volume removed by        dialysis to preset standard weight of each of the plurality of        patients and the one specific patient.

(Supplementary Note 9)

The body fluid volume estimation device according to any one ofSupplementary Notes 1 to 8, further including a storage unit configuredto store the face images of the plurality of patients to be used forpre-training, the information indicating the body fluid volume of theplurality of patients when the face images of the plurality of patientsare captured, and the plurality of face images of the one specificpatient to be used for the transfer learning, in which

-   -   the pre-training unit reads, from the storage unit, the face        images of the plurality of patients and information indicating        the body fluid volume when the face images of the plurality of        patients are captured, and performs pre-training, and    -   the transfer learning unit reads, from the storage unit, the        plurality of face images of the one specific patient, and        performs transfer learning.

(Supplementary Note 10)

The body fluid volume estimation device according to any one ofSupplementary Notes 1 to 9, further including an image capturing unit,

-   -   in which a face image of the one specific patient being captured        by the image capturing unit is input to the estimation unit.

(Supplementary Note 11)

A body fluid volume estimation method including:

-   -   performing pre-training on face images of multiple patients by        using, as supervised information, information indicating a body        fluid volume of each of the plurality of patients when the face        images of the plurality of patients are captured;    -   further performing transfer learning on multiple face images of        one specific patient after the pre-training, and constructing a        trained model; and    -   estimating, by inputting a face image of the one specific        patient to the trained model, a body fluid volume of the one        specific patient at a point in time at which the face image of        the one specific patient is captured.

(Supplementary Note 12)

A program causing a computer to execute:

-   -   processing of performing pre-training on face images of multiple        patients by using, as supervised information, information        indicating a body fluid volume of each of the plurality of        patients when the face images of the plurality of patients are        captured;    -   processing of further performing transfer learning on multiple        face images of one specific patient after the pre-training, and        constructing a trained model; and    -   processing of estimating, by inputting a face image of the one        specific patient to the trained model, a body fluid volume of        the one specific patient at a point in time at which the face        image of the one specific patient is captured.

While the disclosure has been particularly shown and described withreference to embodiments thereof, the disclosure is not limited to theseembodiments. It will be understood by those of ordinary skill in the artthat various changes in form and details may be made therein withoutdeparting from the spirit and scope of the present disclosure as definedby the claims.

What is claimed is:
 1. A body fluid volume estimation device comprising:an image capturing device that captures patient images; at least onememory configured to store instructions; and at least one processorconfigured to execute the instructions to: perform pre-training on faceimages of multiple patients by using, as supervised information,information indicating a body fluid volume of each of multiple patientswhen the face images of multiple patients are captured; perform transferlearning on multiple face images of one specific patient after thepre-training, and construct a trained model; estimate, by inputting aface image of the one specific patient to the trained model, a bodyfluid volume of the one specific patient at a point in time at which theface image of the one specific patient is captured; and output anestimation result to a display device.
 2. The body fluid volumeestimation device according to claim 1, wherein the informationindicating the body fluid volume of each multiple patients when the faceimages of the multiple patients are captured includes informationindicating presence of a swelling in the face image of each of themultiple patients, and information indicating weight of each of themultiple patients, and the at least one processor is further configuredto execute the instructions to estimate presence or absence of aswelling and weight of the one specific patient at which the face imageof the one specific patient is captured by inputting the face image ofthe one specific patient to the trained model.
 3. The body fluid volumeestimation device according to claim 2, wherein the at least oneprocessor is further configured to execute the instructions to: detect,based on a prediction result of the presence or absence of the swellingof the one specific patient, whether the body fluid volume of the onespecific patient is changed from a preset standard body fluid volume ofthe one specific patient, and acquire, from an estimation result of theweight, a difference in the body fluid volume of the one specificpatient from the preset standard body fluid volume.
 4. The body fluidvolume estimation device according to claim 2, wherein the informationindicating the presence or absence of the swelling in the face image ofeach of the multiple patients is label information representing thepresence of the swelling, and the at least one processor is furtherconfigured to execute the instructions to perform pre-training in such away that feature values of face images having the same label informationrepresenting the presence or absence of the swelling are brought closeras the information indicating the weight is more similar.
 5. The bodyfluid volume estimation device according to claim 2, wherein the atleast one processor is further configured to execute the instructions toperform pre-training by weight-aware supervised momentum contrast(WeightSupMoCo).
 6. The body fluid volume estimation device according toclaim 2, wherein the multiple patients and the one specific patient area patient who receives dialysis, and weight of each of the multiplepatients and the one specific patient after dialysis associated with acase without a swelling has a value acquired by subtracting a body fluidvolume removed by dialysis from weight of each of the multiple patientsand the one specific patient before dialysis associated with a case witha swelling.
 7. The body fluid volume estimation device according toclaim 2, wherein the multiple patients and the one specific patient area patient who receives dialysis, and weight of each of the multiplepatients and the one specific patient before dialysis associated with acase with a swelling has a value acquired by adding a body fluid volumeremoved by dialysis to weight of each of the multiple patients and theone specific patient after dialysis associated with a case without aswelling.
 8. The body fluid volume estimation device according to claim2, wherein the multiple patients and the one specific patient are apatient who receives dialysis, and weight of each of the multiplepatients and the one specific patient before dialysis associated with acase with a swelling has a value acquired by adding a body fluid volumeremoved by dialysis to preset standard weight of each of the multiplepatients and the one specific patient.
 9. The body fluid volumeestimation device according to claim 1, wherein the at least oneprocessor is further configured to execute the instructions to: storethe face images of the multiple patients to be used for pre-training,the information indicating the body fluid volume of the multiplepatients when the face images of the multiple patients are captured, andthe multiple face images of the one specific patient to be used for thetransfer learning, read the face images of the multiple patients andinformation indicating the body fluid volume when the face images of themultiple patients are captured, and performs pre-training, read themultiple face images of the one specific patient, and perform transferlearning.
 10. A body fluid volume estimation method for a deviceincluding an image capturing device comprising: performing pre-trainingon face images of multiple patients by using, as supervised information,information indicating a body fluid volume of each of the multiplepatients when the face images of the multiple patients are captured;further performing transfer learning on multiple face images of onespecific patient after the pre-training, and constructing a trainedmodel; estimating, by inputting a face image of the one specific patientcaptured by the image capturing device to the trained model, a bodyfluid volume of the one specific patient at a point in time at which theface image of the one specific patient is captured; and outputting anestimation result to a display device.
 11. A non-transitorycomputer-readable medium storing a program causing a computer includingan image capture device to execute: processing of performingpre-training on face images of multiple patients by using, as supervisedinformation, information indicating a body fluid volume of each of themultiple patients when the face images of the multiple patients arecaptured; processing of further performing transfer learning on multipleface images of one specific patient after the pre-training, andconstructing a trained model; processing of estimating, by inputting aface image of the one specific patient captured by the image capturingdevice to the trained model, a body fluid volume of the one specificpatient at a point in time at which the face image of the one specificpatient is captured; and processing of outputting an estimation resultto a display device.